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Abstract — In problems of lossy source/noisy channel coding 
with side information, the theoretical bounds are achieved using 
"good" source/channel codes that can be partitioned into "good" 
channel/source codes. A scheme that achieves optimality in 
channel coding with side information at the encoder using 
independent channel and source codes was outlined in previous 
works. In practice, the original problem is transformed into a 
multiple-access problem in which the superposition of the two 
independent codes can be decoded using successive interference 
cancellation. Inspired by this work, we analyze the superposition 
approach for source coding with side information at the decoder. 
We present a random coding analysis that shows achievability of 
the Wyner-Ziv bound. Then, we discuss some issues related to 
the practical implementation of this method. 

I. Introduction 

The problem of source coding in presence of some corre- 
lated side information at the decoder which is not accessible 
by the encoder has many applications. In particular, it arises 
in all sensor networks, where the power for communication 
is severely constrained (TJ, and in sequential source coding 
of correlated data when for some reasons (i.e. encoding com- 
plexity or robustness against transmission errors) data must be 
encoded separately, as in video coding Q. 

In the lossless case, the minimum achievable transmission 
rate was found by Slepian and Wolf, who showed that there 
is no rate penalty with respect to the case in which the side 
information is also available at the encoder 0. In the lossy 
case, where the goal is to minimize the transmission rate under 
a given distortion constraint on the reconstructed source, this 
rate-distortion function was found by Wyner and Ziv flU. In 
the latter case, in general, a rate penalty may occur in having 
the side information only at the decoder. In both cases, the 
achievability of these rate bounds was shown using randomly 
generated codes whose codewords are randomly grouped into 
bins. 

In practice, codes having some structure must be used. 
In the Slepian- Wolf problem (lossless/near-lossless case) all 
"good" practical codes currently employed in channel coding 
(such as turbo [5| and LDPC (6) codes) induce a partition 
of the set of all source outcomes that is in fact a very good 
binning. Hence, the utilization of these codes leads to practical 
schemes with performances close to the theoretical bounds Q. 
Similar results may be also obtained in lossless multiterminal 
network coding |8|. In the Wyner-Ziv problem (lossy coding) 
practical code design is more difficult. Zamir et al. showed 
that nested linear codes (in the binary case) or nested lattice 



codes (in the continuous case) achieve asymptotically the rate- 
distortion bound (9). Similarly, the same structures are shown 
to achieve asymptotically the capacity-cost function in the 
channel coding problem where side information regarding the 
channel state is non-causally available at the encoder but not 
at the decoder [10|. But, in practice, finite-dimensional lin- 
ear/lattice codes must be used as proposed in ifTTl or lfl2l that 
limit the achievable performances. However, in [12| a scheme 
is proposed where finite-dimensional nested quantization on 
nested lattices is followed by a second Slepian- Wolf "binning" 
stage in order to limit the performance loss. At high rates, it 
is shown that the scheme performs as conventional entropy- 
coded lattice quantization with side information known as well 
at the encoder. Similar solutions can be employed also in 
general multiterminal source code design ff3l . 

For the case of channel coding with additive noise and 
an additive interference signal non-causally known at the 
encoder but not at the decoder (i.e. the so-called dirty paper 
problem |14|), a practical scheme has been recently pro- 
posed lfT31l which achieves the capacity-cost function using 
two independent rather than nested codes. In particular, the 
original problem is transformed into an equivalent multiple- 
access problem in which the superposition of the two codes 
is decoded using successive interference cancellation. Very 
good performances are shown for both the binary [ 16] and the 
Gaussian [17] setting. Since the codes can be independently 
designed, each one of them can be specifically tailored for the 
purpose it serves. In fact, one of them mainly serves as source 
code (on which we must be able to perform quantization), 
while the other serves as channel code (on which we must 
apply some channel-decoding operations with performance 
close to the one of joint-typicality used in random analysis). 

Since the problem of lossy source coding with side in- 
formation is "dual" to the problem of noisy channel coding 
with side information, it may seem straightforward that a 
similar superposition approach with independent codes may 
be effectively used in the Wyner-Ziv setting. However, the 
encoder/decoder of one problem functionally corresponds to 
the decoder/encoder of the other one only under certain 
hypothesis [18|. For example, in the binary setting, the two 
problems are not exactly duals in the sense discussed in |18|. 
Moreover, functional duality assumes that both encoder and 
decoder operate an exact joint-typicality operation, which in 
practice is not the case. For example, belief-propagation in 
traditional channel decoding of turbo/LDPC codes roughly 



corresponds to joint-typicality only if the input to the decoder 
is "close" to an actual codeword, but not in general. Hence, in 
this paper, we aim to analyze the superposition approach for 
the Wyner-Ziv problem without relying on duality. 

The rest of this paper is organized as follows. In Section 
iHl we review the superposition coding approach for the dirty 
paper problem. In Section [ill] we analyze the performance of 
superposition coding for source coding with side information 
from a random coding perspective. The result of this analysis 
is then particularized for the binary and the Gaussian case. 
The issues involved in a practical implementation of this 
method are discussed in Section [IV] Section [V] summarizes 
our conclusions. 

II. Superposition Coding for Writing on Dirty 
Paper 

Consider the additive memoryless channel 

Y = X + S + Z , 

where X is the input to the channel, S is an interference signal 
known (non-causally) to the encoder and independent from X, 
Z is an unknown channel noise independent from X and S, 
and Y is the channel output. Assume that the channel is used 
n times, without feedback; X n , S n , Z n , and Y n denote the 
involved random vectors. 

In the binary case the alphabet over which the random 
variables take values is A = GF(2), "+" indicates the sum 
over the field GF{2), S ~ £(1/2) (i.e. is distributed as 
Bernoulli-1/2), Z ~ B(p), and X is subject to the cost 
constraint E[d H {X n , n )/n] < W, where d H {-,-) is the 
Hamming distance. Two codes Cq C A n and C% C A n 
are constructed at rates R and Ri (bit/symbol) by random 
i.i.d. selection according to distributions 6(1/2) and B(q), 
respectively. 

Given x n and a code C, Tc(x n ) denotes a codeword of 
C which is (strongly) jointly typical with x n if there exist 
at least one, otherwise it is a random codeword of C. Define 
[x n ]c — Tc{x n )—x n . The encoder selects a codeword c™ G C\ 
and sends the sequence 

x - = [ S « - c»] Co . 
If T Co (s" - c?) = c£ € Co, the channel output is 

y n = - (s n - cj) + s n + z n = Cq 1 + c n x + z n . 

The decoder computes the pair (cq,c™) G Co x C\ such that 
it is jointly typical with y n , announcing c™ as the decoded 
message. 

By random analysis over all possible tuples of codes (see 
El ), if R > Ri/ 2 (W), where R 1/2 (W) = 1 - H(W) 
is the Hamming-distortion rate-distortion function [19| of a 
binary symmetric source, the average probability of having 
X n violate the cost constraint W (encoder error) approaches 
zero with n — > ooQ In addition, if R\ < C p (q) and 
Ro + Ri < C p (l/2), where C p {q) = H{p * q) - H{p) 

'Note that in fact S n — C" is a binary symmetric source. 



(p * q = p(l — q) + q(l — p)) is the Hamming-weight cost- 
capacity function of a binary symmetric channel with error 
probability p, the probability of having (cq, c") ^ (cj , c") 
(decoder error) vanishes with n oo[3 If a q* is chosen 
such that p * q* = W, then reliable transmission (i.e. without 
encoder nor decoder error) is possible at rates arbitrarily close 
to RI = H(W) — H(p), which equals the cost-capacity 
function with side information at the encoder of the considered 
channel, at least for all constraints 1 - 2~ H ^ < W < 1/2. 

In the continuous case the alphabet over which the random 
variables take values is A = [-A/2, A/2) ~ R/AZ (for some 
A > 0), "+" indicates the modulo- ,4 sum, Z — A/a(0,Pz) 
(i.e. is distributed as A-aliased Gaussian variable with zero 
mean and variance Pz), and X is subject to the cost constraint 
i?[||X Tl || 2 /n] < Px, where || • || 2 is the square-distance norm. 
The two codes Cq C A n and C% C A n are constructed at rates 
i?o and i?i (bit/symbol) by random i.i.d. selection according 
to distributions U[— A/2, A/2) (i.e. uniform) and jVa(0, Q), 
respectively. 

Assume that D is a dither signal known to both encoder and 
decoder, drawn accordingly to U[— A/2, A/2), and indepen- 
dent from all other variables. The encoder selects a codeword 
c" G C\ and sends the sequence 

x" = [as" -4 + d n ] Co . 

If T Co (as n - c'l + d n ) = c l G C , the channel output is 

y" = cq 1 - (as n - cj + d n ) + s n + z n . 

The decoder first computes 

y n = ay n + d n = c l + t% + [az n - (1 - a)x n ] , 

" v ' 

and then finds the pair (cq,c™) G Cq x Ci such that it is 
jointly typical with y n , announcing c™ as the decoded message. 
a = Px/(Px + Pz) minimizes the power of the equivalent 
noise in this virtual multiple-access channel (MAC), producing 
Pz = aP z . 

Again, if in Co there are enough codewords, it is possible to 
have a vanishing probability of encoding errors; in addition, if 
in Ci and in Co + C\ there are not too many codewords, it is 
possible to have a vanishing probability of decoding errors. In 
particular when A — > oo (i.e. when the channel is AWGN), if 
a Q* is chosen such that P^ + Q* = Px, then reliable trans- 
mission (i.e. without encoder nor decoder error) is possible at 
rates arbitrarily close to RI — (1/2) \og 2 (l + P x / Pz), which 
equals the cost-capacity function with side information at the 
encoder of the considered channel |[T4l . Details are given in 

ma. 

Note that in both cases the two codes take a very different 
role. The code C\ must be designed in order to be a good 
channel code with respect to the virtual MAC arising in the 
decoding operation. The code Co has instead dual requirements 

2 Note that in fact Cg + C™ + Z n defines a binary symmetric channel in 
both the case with the codewords of Ci as input (and known Cq ) or the case 
with codewords of Co + Ci as input. 



to be both a good source code, in order to avoid the encoder 
error, and a good channel code, for effective MAC decoding. 
However, since we are not interested in exactly decoding the 
correct Cq , one should mainly be concerned in choosing a code 
Co which is good for source coding purposes. The same exact 
considerations arise when analyzing this superposition scheme 
from a practical perspective. In practice, in source encoding the 
joint-typicality operation is replaced by maximum likelihood 
decoding, and structured codes which allow for exhaustive 
search over all codewords (e.g. trellis codes) are employed; 
in channel decoding the joint-typicality operation is approxi- 
mated using some belief propagation algorithm conducted on 
codes with a higher degree of randomness (e.g. turbo or LDPC 
codes). For these reasons, in practical schemes where Cq is a 
trellis code, C\ is a turbo code, and iterative algorithms are 
employed during decoding, the performances are very close to 
the theoretical bounds, even if Cq is not good from a channel 
coding perspective. 

III. Superposition Coding for Wyner-Ziv Coding 

The superposition approach for writing on dirty paper is 
possible because of the additive effect of the interference 
signal (known at the encoder). Let us consider a tuple of 
correlated random variables (X, Y) such that 

X = Y +Z , 

for some Z independent from Y; X is the input to the encoder 
and Y is the side information (known at the decoder); X 
denotes the reconstruction of X at the decoder. The alphabet 
over which these random variables take values is a finite 
group Q with 2 l elements {I > 1), "+" indicates the sum 
over this group, Y ~ U(Q) (i.e. is uniformly distributed over 
G), Z ~ p(z), and X is subject to the distortion constraint 
E[d H (X n ,X n )/n] <D. 

The two codes Co C Q n and C\ C Q n are constructed 
at rates Ro and R\ (bit/symbol) by random i.i.d. selection 
according to distributions U(Q) and q(-), respectively. 

The encoder looks for a codeword (cq + c"), with c™ e C L , 
such that it is (strongly) jointly typical with x n , and sends 
c™ to the decoder (if there are no jointly typical codewords, 
a random codeword is sent). There is an encoder error if 
d H (x n ,c8 + <%)>D. 

Theorem 1: The probability of having an encoder error 
vanishes with n — ► oo if 

Ro > I- H{q * d) 
Ro + Ri > Bu(D), 

where d(-) is the distribution which takes zero with probability 
1 — D and all other symbols with probability ktzj, and 
Ru(D) = H(X) — H{d) is the Hamming-distortion rate- 
distortion function of the uniform random var iableU 

Proof: Asymptotically, x n takes values on a set of 2 nH ( x * > 
elements, and for a fixed Cq, the code C\ covers at most 

3 d(-) maximizes H(d) and H (q * d) over all distributions with d(0) = 
1 - D. 
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Fig. 1. Region in which there are no encoder nor decoder errors. 



2nH(q*d) Q f t }j em w ithin distortion D. Hence, there must be at 
least 2 nH ( x *>~ nH ( q * d ) codewords in code Co- Then, since there 
are at most 2 nH ^ covered elements within balls of distortion 
D, the two codes must provide at least 2 nH ^ x ^ nH ^ = 
2nRu{D) coc [ eworc i s Once we have these two conditions the 
probability of finding a typical (cq + c") approaches one and 
the distortion constraint is not violated. This may be proved 
with an argument similar to the one used in the standard proof 
of the achievability of the rate-distortion function |fl9l . 

The decoder receives the codeword c" = x n — Cq + z n , 
where Z is independent from X and distributed as d(z) 
(it corresponds to the subtractive noise in the equivalent 
symmetric test channel between X and X). Then it computes 



[V n - c n i]c n = [eg - (*" + z n )]c a c = z n + 



(1) 



and finally reconstructs 

x n = y n + [f- 



c?]c =■ x n + 5' 



Equality in ([TJ is conditional on correct decoding (no decoding 
errors), i.e. it holds only if there is only one codeword in Co 
jointly typical with (y n — c"). 

Theorem 2: The probability of having a decoder error van- 
ishes with n — > oo if 

Ro < C p *d , 

where C p *d = I — H(p* d) is the (unconstrained) capacity of 
an additive channel on Q with noise distributed as p * d(-). 

Proof: Asymptotically, the equivalent noise (z n + z n ) is 
distributed according to p * d(-) (note that Z is independent 
from Z), and hence takes values on a set having 2 nH< ^ p * d ^ 
elements. Hence, there can be at most 2 nH ^ x ^ nH{ - p * d ' ) = 
2 nC p* d non overlapping codewords in Co. Again, once this 
condition is met the achievability of no decoding errors may 
be proved with an argument similar to the one used in the 
standard proof of the achievability of channel capacity |fl9l . 

The rate region where there are no errors is not empty for 
any q(-) such that H(q*d) > H(p*d) and is shown in Fig.Q] 
In the following, we will particularize this result for the doubly 
symmetric binary and for the Gaussian case, and show that the 
Wyner-Ziv bound can be achieved in both cases. 

A. Binary Sources 

In case of Q = GF{2), where Z - B(p) and d - B(q), 
the rate-distortion function and the channel capacity involved 



in the calculation of the rate region equal 

Ru{D) = I -H{D) 
C p , d = l-H(p*D), 
respectively. Hence, the lowest achievable value for i?i is 

Rl = R U (D) - C p , d = H(p * D) - H(D) , 

that is the rate-distortion function with side information for all 
distortions < D < D' < p [4]. The rate-distortion function 
for D' < D < p is achieved by time-sharing of the two 
working points (H(p * D') - H(D'), D') and (0,p)Q 

B. Gaussian Sources 

Assume now that Q = R, Y ~ W(0, P Y ), Z ~ JV(0, P z ), 
C ~ 7V(0, P ), and Ci ~ 7V(0, Q). The constraint is given 
in terms of the squared Euclidean distance. In order to analyze 
the Gaussian case, which is not discrete, some care should be 
taken because of the fact that while the test channel between 
X and X is still additive, the equivalent channel between X 
and X is additive only if a suitable scaling factor is introduced 

m. 

In this case, assuming that U ~ A/"(0, D) is a dither signal 
known to both encoder and decoder, and independent from 
all other variables, the encoder sends the c" corresponding to 
(j3x n + u n ), i.e. the decoder receives 

c ™ = j3x n + u n - c£ + z n , 

where Z is (a scaled version of) the additive noise in the 
channel from X to X, and is independent from X (and from 
Z). If the rate i?o and the sum Rq + Ri are high enough, there 
exist codes such that Z has maximum power D (no encoder 
error). 

The decoder evaluates 

[f3y n + u n - c?] Co = [c™ - (/3z n + z n )]co =' (3z n + z n (2) 
and finally reconstructs 

x n = y n + (3[f3y n + u n - c?] Co c d' x n + (*3z n ~ (1 - f3 2 )z n ) . 

If (3 = y/1 - D/P z , the power of ((3z n - (1 - (3 2 )z n ) is 
minimized and equals exactly D; the power of (J3z n + z n ) 
equals Pz- Hence, if the rate Rq is less than a certain thresh- 
old, we can have correct decoding in @, i.e. no decoding 
error. The minimum achievable rate R\ in this case can be 
asymptotically computed with a geometric argument: the final 
goal is to cover each ball related to a codeword of Co (which 
has at least power Pz) with as least as possible balls of power 
D, each one of them related to one codeword of C\ (for which 
Q + D > Pz). Finally we obtain 

5 ■*(£)■ 

which equals the rate-distortion function with side information 
in the Gaussian case. The power Pq must be enough in order 
to have the codewords of Co + d cover all the space in which 
asymptotically (3x n + u n lies. 

*D> is such that rf[H(p»%-HWl j = h {p ,d')-h(d>) 



IV. Implementation Issues 

As clear from the previous section, with superposition cod- 
ing it is possible to find two independent codes that guarantee 
the achievability of the Wyner-Ziv bound in the two examined 
cases. In particular, the code C\ must offer a good covering of 
the space, i.e. it must be good for source coding purposes. On 
the other side, Co must take the role of both a good code for 
source coding and a good code for channel coding. However, 
even if Co was not very good from a source coding perspective, 
it is reasonable that increasing the power (q or Q) of C\ the 
superposed code is still good for source coding. It is instead 
crucial that Co is a good channel code in order to avoid the 
decoding errors. 

The best source codes available today are represented by 
the trellis codes l20l which offer performance very close to 
the rate-distortion function. For this codes, given a random 
realization of the variable to be quantized, it seems crucial 
that there is the possibility to search for the closest (i.e. the 
most likely) codeword by examining all codewords. Hence, Ci 
should in general be a good trellis code. 

The best channel codes are instead represented by turbo 
and low-density parity-check codes (LDPC). Those codes 
have a higher degree of randomness, which prevents (from a 
computational complexity point of view) conducting the search 
over all codewords for applying exactly a maximum likelihood 
approach. However, there exist very good message passing 
algorithms over their factor-graphs f2p . which almost always 
converge to the most likely output. Then, Co should be one of 
these codes. 

Unfortunately, the message-passing algorithms fail to con- 
verge when the input distribution is not unimodal and centered 
over an actual codeword, as it happens in quantization. In 
the superposition coding approach, in principle, we should be 
able to quantize the source outcome over Co and successively 
quantize a residual over C\ (this would be similar to successive 
interference cancellation used in the MAC case). For the 
reason mentioned above, the currently available algorithms do 
not allow to perform the first quantization. An approach in 
which Co is chosen to be a convolutional code was presented 
in ll22ll and achieved a 3-^4 dB gap with respect to the Wyner- 
Ziv bound. That gap is due to the fact that the performance of 
convolutional codes is somewhat far from the channel capacity. 

For practical implementation of the superposition approach, 
which still remains interesting for the possibility to use in- 
dependent codes, it will be necessary to find good sparse 
codes and develop good algorithms for lossy quantization over 
them. One of the first works showing that iterative algorithms 
may work for quantization too appeared in ll23l : that result 
was obtained by duality with good codes for binary erasure 
channels. More recently, schemes based on survey propagation 
were adapted from the field of statistical physics in order to 
do data compression as well 11241 . Among the sparse codes 
that are currently under investigation for source coding, there 
is an high interest in low-density generator matrix codes 
(LDGM) which are shown to achieve the rate-distortion bound 



(25), l26l . Nevertheless, general algorithms which allow for 
practical utilization of these codes have not appeared yet. 
Other somewhat more practical approaches have appeared in 
ED, EH, USD. 

V. Conclusion 

In this paper we discussed the superposition coding ap- 
proach for the problem of source coding with side information 
at the decoder. For the case of a general additive-symmetric 
discrete correlation channel between the (uniform) side infor- 
mation and the source, we derived a rate region for the two 
independent superposed codes in which the desired distortion 
bound can be achieved. We showed that in the binary case 
the Wyner-Ziv bound is achievable, and extended the same 
result to the Gaussian case with Gaussian side information. 
Finally, we discussed the implementation issues involved in 
this scheme, which requires quantization over a code which 
must be a "good" code from a channel coding perspective. 
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